Cancer waiting times projections over the parliamentary term
Context
During the run up to general election 2024, Cancer Research UK wanted to keep cancer high on the political agenda.
Waiting times had long been a pressing issue, and likely to generate press
The project
Cancer waiting times over the next parliamentary term were identified as a good fit, looking at the size of demand for NHS services over the next 5 years if things don’t change.
This project focused on three key points:
- Number of urgent cancer referrals being made
- Number of people starting treatment - 31 day wait metric
- How many won’t start their treatment within the NHS England target window - 62 day wait metric
Data: NHS England Cancer Waiting Times metrics (publicly available dataset)
For each metric, I ran 3 models each starting with a different time period 2009 (start of recorded data), 2019 (five years of data), 2021 (post-COVID) and predicting the total number of patients for each month.
Example code for models
#model from different start points with glm
total_model <- glm(total ~ monthly, gaussian(),
data = USCR_predictions_ENG %>%
filter(!is.na(total)))
total_model_from19 <- glm(total ~ monthly, gaussian(),
data = USCR_predictions_ENG %>%
filter(!is.na(total) & monthly > "2019-01-01"))
#Make predictions
USCR_predictions_ENG <- USCR_predictions_ENG %>%
mutate(
predicted_total = predict(total_model,
newdata = USCR_predictions_ENG,
type = "response"),
predicted_total_from19 = predict(total_model_from19,
newdata = USCR_predictions_ENG,
type = "response")
)Urgent cancer referrals
Full referrals modelling code
CWTUSC <- read_xlsx("data/CWT-CRS-National-Time-Series-Oct-2009-Jun-2024-with-Revisions.xlsx",
sheet = 5, range = "B4:C181") %>%
clean_names() %>%
mutate(monthly = as.Date(monthly))
## Modelling
USCR_predictions_ENG <- CWTUSC %>%
mutate(month_no = 1:nrow(CWTUSC)) %>%
select(monthly, total) %>%
arrange(monthly)
future_dates <- data.frame(
monthly = seq.Date(from = max(CWTUSC$monthly) + months(1),
by = "month", length.out = 69)
)
USCR_predictions_ENG <- bind_rows(USCR_predictions_ENG, future_dates)
# Fit regression models to each list element using non NA i.e. historic data for COUNT in order to be used to offset the outside_target model
total_model <- glm(total ~ monthly, gaussian(),
data = USCR_predictions_ENG %>%
filter(!is.na(total)))
total_model_from19 <- glm(total ~ monthly, gaussian(),
data = USCR_predictions_ENG %>%
filter(!is.na(total) & monthly > "2019-01-01"))
total_model_from21 <- glm(total ~ monthly, gaussian(),
data = USCR_predictions_ENG %>%
filter(!is.na(total) & monthly > "2021-01-01"))
#Make predictions
USCR_predictions_ENG <- USCR_predictions_ENG %>%
mutate(
predicted_total = predict(total_model,
newdata = USCR_predictions_ENG,
type = "response"),
predicted_total_from19 = predict(total_model_from19,
newdata = USCR_predictions_ENG,
type = "response"),
predicted_total_from21 = predict(total_model_from21,
newdata = USCR_predictions_ENG,
type = "response")
)Rounding function
CRUKRoundingPerYear <- function(number){
# number <- 10050000
number <- case_when(
number >= 100000000 & number - floor(number / 1000000) * 1000000 == 500 ~ number + 1,
number >= 100000000 ~ number,
number >= 10000000 & number - floor(number / 100000) * 100000 == 500 ~ number + 1,
number >= 10000000 ~ number,
number >= 1000000 & number - floor(number / 10000) * 10000 == 500 ~ number + 1,
number >= 1000000 ~ number,
number >= 100000 & number - floor(number / 1000) * 1000 == 500 ~ number + 1,
number >= 100000 ~ number,
number >= 10000 & number - floor(number / 100) * 100 == 50 ~ number + 1,
number >= 10000 ~ number,
number >= 1000 & number - floor(number / 100) * 100 == 50 ~ number + 1,
number >= 1000 ~ number,
number >= 100 & number - floor(number / 10) * 10 == 5 ~ number + 0.01,
number >= 100 ~ number,
number >= 10 & number - floor(number / 1) * 1 == 0.5 ~ number + 0.0001,
number >= 10 ~ number,
TRUE ~ number)
numbertext <- case_when(
number >= 100000000 & number - floor(number/1000000)*1000000 >= 900 ~
paste0("around ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000000 & number - floor(number/1000000)*1000000 >= 700 ~
paste0("nearly ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000000 & number - floor(number/1000000)*1000000 >= 200 ~
paste0("more than ", format(signif(floor(number/100000)*100000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000000 ~
paste0("around ", format(signif(floor(number/100000)*100000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 10000000 & number - floor(number/100000)*100000 >= 900 ~
paste0("around ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 10000000 & number - floor(number/100000)*100000 >= 700 ~
paste0("nearly ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 10000000 & number - floor(number/100000)*100000 >= 200 ~
paste0("more than ", format(signif(floor(number/100000)*100000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 10000000 ~
paste0("around ", format(signif(floor(number/100000)*100000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 1000000 & number - floor(number/10000)*10000 >= 900 ~
paste0("around ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 1000000 & number - floor(number/10000)*10000 >= 700 ~
paste0("nearly ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 1000000 & number - floor(number/10000)*10000 >= 200 ~
paste0("more than ", format(signif(floor(number/10000)*10000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 1000000 ~
paste0("around ", format(signif(floor(number/10000)*10000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000 & number - floor(number/1000)*1000 >= 900 ~
paste0("around ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000 & number - floor(number/1000)*1000 >= 700 ~
paste0("nearly ", format(signif(number, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000 & number - floor(number/1000)*1000 >= 200 ~
paste0("more than ", format(signif(floor(number/1000)*1000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 100000 ~
paste0("around ", format(signif(floor(number/1000)*1000, digits = 3),
big.mark=",", scientific = FALSE)),
number >= 10000 ~
paste0("around ", format((round(number/100, 0)*100), big.mark=",")),
number >= 1000 ~
paste0("around ", format((round(number/100, 0)*100), big.mark=",")),
number >= 100 ~
paste0("around ", round(number/10, 0)*10),
number >= 10 ~
paste0("around ", round(number/5, 0)*5),
number >= 7.5 ~
paste0("around 10"),
number >= 5 ~
paste0("around 5"),
TRUE ~
paste0("less than 5"))
return(numbertext)
}Visualising the different referrals models
predictions_ENG_summary <- USCR_predictions_ENG %>%
filter(monthly > "2024-06-01" & monthly < "2029-07-01") %>%
summarise(predicted_total = sum(predicted_total),
predicted_total_from19 = sum(predicted_total_from19),
predicted_total_from21 = sum(predicted_total_from21))
plot_ly(USCR_predictions_ENG) %>%
add_trace(x = ~monthly,
y = ~predicted_total,
name = paste0("2009 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_total),
" people"),
line = list(color = "#ff0087", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(USCR_predictions_ENG, monthly >= "2019-01-01"),
x = ~monthly,
y = ~predicted_total_from19,
name = paste0("2019 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_total_from19),
" people"),
line = list(color = "#009cee", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(USCR_predictions_ENG, monthly >= "2021-01-01"),
x = ~monthly,
y = ~predicted_total_from21,
name = paste0("2021 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_total_from21),
" people"),
line = list(color = "#00007e", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(USCR_predictions_ENG, monthly >= "2009-01-01"),
x = ~monthly,
y = ~total,
name = paste0("Historical data"),
line = list(color = "#000000"),
type = 'scatter',
mode = 'lines') %>%
config(displaylogo = FALSE,
toImageButtonOptions = list(
format = "png",
filename = paste0("England USC models"),
width = 1400,
height = 600)) %>%
layout(
showlegend = TRUE,
legend = list(
orientation = "h",
y = -0.1,
title = list(text = "Model starting year and predicted number of patients over next parliamentary term",
font = list(family = "Poppins", color = "#000000", size = 18),
side = "top"),
font = list(family = "Poppins", color= "#000000", size= 14)
),
yaxis = list(
title = "Number of people",
showgrid = T,
zeroline = T,
tickmode = 'auto',
tickformat = '~s',
nticks = 5,
range = c(0, 400000),
titlefont = list(family = "Poppins", color = "#000000", size = 18),
tickfont = list(family = "Poppins", color= "#000000", size= 14)),
xaxis = list(
title = "",
range = c("2010-01-01", "2030-01-01"),
tickmode = 'auto',
nticks = 12,
showgrid = F,
tickfont = list(family = "Poppins", color= "#000000", size= 14),
titlefont = list(family = "Poppins", color = "#000000", size = 18)),
title = list(text = "Number of urgent cancer referrals<br><sup>England</sup>",
font = list(family = "Progress Medium",
color = "#000000",size = 28),
xref = 'container',
xanchor = 'left',
x = 0.01, y= 0.95),
margin = list(t = 80, b = 30)
)Getting final numbers
## To get new aggregate number
England_USC_prediction <- USCR_predictions_ENG %>%
filter(monthly > "2024-06-01" & monthly < "2029-07-01") %>%
summarise(
predicted_referrals_09 = sum(predicted_total),
rounded_09 = CRUKRoundingPerYear(predicted_referrals_09),
predicted_referrals_19 = sum(predicted_total_from19),
predicted_referrals_21 = sum(predicted_total_from21)
)
AnnualPercChange <- USCR_predictions_ENG %>%
#mutate(year = as.numeric(format(date,'%Y'))) %>%
filter(monthly >= "2023-07-01" & monthly < "2029-07-01") %>%
mutate(year = case_when(monthly > "2023-06-01"
& monthly < "2024-07-01" ~ "2023/24",
monthly > "2024-06-01"
& monthly < "2025-07-01" ~ "2024/25",
monthly > "2025-06-01"
& monthly < "2026-07-01" ~ "2025/26",
monthly > "2026-06-01"
& monthly < "2027-07-01" ~ "2026/27",
monthly > "2027-06-01"
& monthly < "2028-07-01" ~ "2027/28",
monthly > "2028-06-01"
& monthly < "2029-07-01" ~ "2028/29")) %>%
group_by(year) %>%
mutate(total = coalesce(total, predicted_total)) %>%
summarise(total = sum(total))
AnnualPercChange <- AnnualPercChange %>%
mutate(percent_inc = total / AnnualPercChange[AnnualPercChange$year == "2023/24",]$total * 100,
count_diff = total - lag(total, n = 1L),
perc_diff = percent_inc - lag(percent_inc, n = 1L),
yearno = row_number())
#increase in referrals at the end of term compared to 23/24
USCPercChange <- round((AnnualPercChange[AnnualPercChange$year == "2028/29",]$total
/ AnnualPercChange[AnnualPercChange$year == "2023/24",]$total - 1) *100, 0)
#Number
ExtraReferrals <- (AnnualPercChange[AnnualPercChange$year == "2028/29",]$total
- AnnualPercChange[AnnualPercChange$year == "2023/24",]$total)Over the next five years, there are projected to be around 17,200,000 urgent suspected cancer referrals in England.
By 2029, there would be 21% more urgent cancer referrals than there were during the last year, which would be more than 648,000 people being referred for urgent suspected cancer.
31 day waits
31 day modelling
CWT31 <- read_csv("data/CWT_31day_Overall_2024-09-04.csv",
show_col_types = FALSE) # From sst hub
## Filtering and tidying names of main data
# Going to use only First treatments to match historic and other nations data like picking USC route for 62 decision.
CWT31ENG <- CWT31 %>%
filter(Country == "England" & FirstOrSubsequent == "First Treatment") %>%
select(1:8, -Country, -FirstOrSubsequent) %>%
clean_names() %>%
mutate(date = format(floor_date(parse_date_time(date, orders = "my"),
unit = "month")) %>%
as.Date(format = "%Y-%m-%d"))
## Modelling
predictions_ENG <- CWT31ENG %>%
mutate(month_no = 1:nrow(CWT31ENG)) %>%
select(date, count, outside_target, month_no) %>%
arrange(date)
future_dates <- data.frame(date = seq.Date(from = max(CWT31ENG$date) + months(1),
by = "month", length.out = 69),
month_no = seq(max(predictions_ENG$month_no) + 1,
max(predictions_ENG$month_no) + 69,
length.out = length((max(predictions_ENG$month_no)+1):(max(predictions_ENG$month_no) + 69))),
outside_target = NA,
count = NA)
predictions_ENG <- bind_rows(predictions_ENG, future_dates)
# Fits guassian regression models to each list element using non NA i.e. historic data for COUNT in order to be used to offset the outside_target model
Model_31_2009 <- glm(count ~ date, gaussian(),
data = predictions_ENG %>%
filter(!is.na(count)))
Model_31_2019 <- glm(count ~ date, gaussian(),
data = predictions_ENG %>%
filter(!is.na(count) & date > "2019-01-01"))
Model_31_2021 <- glm(count ~ date, gaussian(),
data = predictions_ENG %>%
filter(!is.na(count) & date > "2021-01-01"))
predictions_ENG <- predictions_ENG %>%
mutate(predicted_count_2009 = predict(Model_31_2009,
newdata = predictions_ENG,
type = "response"),
predicted_count_2019 = predict(Model_31_2019,
newdata = predictions_ENG,
type = "response"),
predicted_count_2021 = predict(Model_31_2021,
newdata = predictions_ENG,
type = "response")
) %>%
mutate(performance = 1-(outside_target/count),
average_6m_count = rollmean(count, 6,
fill = NA, align = "right"),
average_6m_outside = rollmean(outside_target, 6,
fill = NA, align = "right"),
average_6m_perf = 1-(average_6m_outside/average_6m_count),
multiplier = last(na.omit(average_6m_perf)))
## To get new aggregate number
predictions_ENG_summary <- predictions_ENG %>%
filter(date > "2024-06-01" & date < "2029-07-01") %>%
summarise(predicted_count_2009 = sum(predicted_count_2009),
predicted_count_2019 = sum(predicted_count_2019),
predicted_count_2021 = sum(predicted_count_2021))
England_outside <- predictions_ENG %>%
filter(date > "2024-06-01" & date < "2029-07-01") %>%
mutate(standard = 0.95,
additional_patients = (standard - multiplier) * predicted_count_2009) %>% #Adds in how many patients needed to meet target
summarise(additional_patients = sum(additional_patients)) %>%
pull()
EnglandAnnualFigures <- predictions_ENG %>%
mutate(year = as.numeric(format(date,'%Y'))) %>%
filter(date >= "2023-07-01" & date < "2029-07-01") %>%
mutate(year = case_when(date > "2023-06-01" & date < "2024-07-01" ~ "2023/24",
date > "2024-06-01" & date < "2025-07-01" ~ "2024/25",
date > "2025-06-01" & date < "2026-07-01" ~ "2025/26",
date > "2026-06-01" & date < "2027-07-01" ~ "2026/27",
date > "2027-06-01" & date < "2028-07-01" ~ "2027/28",
date > "2028-06-01" & date < "2029-07-01" ~ "2028/29")) %>%
group_by(year) %>%
mutate(count = coalesce(count, predicted_count_2009)) %>%
summarise(count = sum(count))Visualising the different 31 day models
#graph
eng_graph <- ggplot(predictions_ENG, aes(date)) +
geom_line(aes(date, predicted_count_2009,
colour = paste0("2009 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_2009))),
linetype = "solid") +
geom_line(data = filter(predictions_ENG, date > "2018-12-31"),
aes(date, predicted_count_2019,
colour = paste0("2019 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_2019))),
linetype = "solid") +
geom_line(data = filter(predictions_ENG, date > "2021-01-01"),
aes(date, predicted_count_2021,
colour = paste0("2021 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_2021))),
linetype = "solid") +
annotate("segment",
x = as.Date("2024-06-30"), xend = as.Date("2024-06-30"),
y = 15000, yend = 38000,
colour = "#BBC7D9", linewidth = 1, linetype = "dashed") +
geom_line(aes(date, count), linewidth=0.5)+
labs(title = "England: 31-day model") +
theme_minimal() +
scale_x_date(date_breaks = "2 years", date_labels = "%Y",
limits = c(as.Date("2010-01-01"), as.Date("2029-06-30"))) +
scale_color_discrete(name = "Model and prediction") +
theme(axis.text.x = element_text(angle = 45), legend.position = "bottom")
eng_graph <- plot_ly(predictions_ENG) %>%
add_trace(x = ~date,
y = ~predicted_count_2009,
name = paste0("2009 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_2009),
" people"),
line = list(color = "#ff0087", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(predictions_ENG, date >= "2019-01-01"),
x = ~date,
y = ~predicted_count_2019,
name = paste0("2019 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_2019),
" people"),
line = list(color = "#009cee", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(predictions_ENG, date >= "2021-01-01"),
x = ~date,
y = ~predicted_count_2021,
name = paste0("2021 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_2021),
" people"),
line = list(color = "#00007e", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(predictions_ENG, date >= "2009-01-01"),
x = ~date,
y = ~count,
name = paste0("Historical data"),
line = list(color = "#000000"),
type = 'scatter',
mode = 'lines') %>%
config(displaylogo = FALSE,
toImageButtonOptions = list(
format = "png",
filename = paste0("England 31 day models"),
width = 1400,
height = 600)) %>%
layout(
showlegend = TRUE,
legend = list(
orientation = "h",
y = -0.1,
title = list(text = "Model starting year and predicted number of patients over next parliamentary term",
font = list(family = "Poppins", color = "#000000", size = 18),
side = "top"),
font = list(family = "Poppins", color= "#000000", size= 14)
),
yaxis = list(
title = "Number of people",
showgrid = T,
zeroline = T,
tickmode = 'auto',
tickformat = '~s',
nticks = 5,
range = c(0, 40000),
titlefont = list(family = "Poppins", color = "#000000", size = 18),
tickfont = list(family = "Poppins", color= "#000000", size= 14)),
xaxis = list(
title = "",
range = c("2010-01-01", "2030-01-01"),
tickmode = 'auto',
nticks = 12,
showgrid = F,
tickfont = list(family = "Poppins", color= "#000000", size= 14),
titlefont = list(family = "Poppins", color = "#000000", size = 18)),
title = list(text = "Number of people starting treatment on the 31-day pathway<br><sup>England</sup>",
font = list(family = "Progress Medium",
color = "#000000",size = 24),
xref = 'container',
xanchor = 'left',
x = 0.01),
margin = list(t = 120, b = 30) #Set margin sizes with t, b, r, l.
)
eng_graphIn England between July 2024 and June 2029 there would be around 1.8 million people starting cancer treatment.
62 day wait
Cleaning data for 62 day wait
CWT62 <- read_csv("data/CWT_62day_By cancer site_2024-09-05.csv",
show_col_types = FALSE) # From sst hub
## Filtering and tidying names of main data
CWT62ENG <- CWT62 %>%
filter(Country == "England" & RefRoute == "Urgent Suspected Cancer") %>%
select(-Country) %>%
clean_names() %>%
mutate(date = format(floor_date(parse_date_time(date, orders = "my"),
unit = "month")) %>%
as.Date(format = "%Y-%m-%d"))
## Old other routes data NB: all sites
CWT_historic_other_routes_allsite <- rbind(
(readxl::read_xlsx("data/Cancer-Waiting-Times-National-Time-Series-Oct-2009-Sep-2023-with-Revisions.xlsx",
sheet = 2, range="AY4:BC172") %>%
mutate(ref_route = "Screening")), #
(readxl::read_xlsx("data/Cancer-Waiting-Times-National-Time-Series-Oct-2009-Sep-2023-with-Revisions.xlsx",
sheet =2, range = "BF4:BJ172") %>%
mutate(ref_route = "Consultant Upgrade"))
) %>%
clean_names() %>%
mutate(date = as.Date(monthly, format = "%Y-%m-%d"),
factor = "All") %>%
select(date, factor, ref_route, total, within_standard, outside_standard) %>%
group_by(date, ref_route, factor) %>%
summarise(
count = round_half_up(sum(total)),
within_target = round_half_up(sum(within_standard)),
outside_target = round_half_up(sum(outside_standard)),
percent_within_target = within_target / count
)
CWT62ENG <- bind_rows(CWT62ENG, CWT_historic_other_routes_allsite)
rm(CWT_historic_other_routes_allsite)
#Sep23-recent new routes data
newroutes_data <- read_csv("data/CWT_62day_By cancer site_2024-09-05.csv",
show_col_types = FALSE) %>%
clean_names() %>%
filter(ref_route %in% c("Consultant Upgrade", "Screening")
| date %in% c("Oct-23", "Nov-23", "Dec-23", "Jan-24", "Feb-24",
"Mar-24", "Apr-24", "May-24", "Jun-24")) %>%
filter(!factor %in% c("Urological (excl. prostate)","Prostate",
"Gynaecological", "Lymphoma",
"Haematological (excl. lymphoma)", "Head and neck",
"Other (new grouping)", "Oesophagus and stomach",
"Hepatobiliary") ) %>% #download includes both separate and combined routes for other and uro so below can remove the separate new sites
select(date, ref_route, total = count, within_target, outside_target)%>%
mutate(date = format(floor_date(parse_date_time(date, orders = "my"),
unit = "month")) %>%
as.Date(format = "%Y-%m-%d"),
factor = "All") %>%
group_by(date, ref_route, factor) %>%
summarise(
count = round_half_up(sum(total)),
within_target = round_half_up(sum(within_target)),
outside_target = round_half_up(sum(outside_target)),
percent_within_target = within_target / count
)
CWT62ENG <- bind_rows(CWT62ENG, newroutes_data)
rm(newroutes_data)
## Fixing site names and grouping
CWT62ENG <- CWT62ENG %>%
select(-standard_hit, -additional_patients, -target_last_met,
-rank, -standard) %>%
filter(!factor %in% c("Urological (excl. prostate)", "Prostate",
"Gynaecological", "Lymphoma",
"Haematological (excl. lymphoma)", "Head and neck",
"Other (new grouping)", "Oesophagus and stomach",
"Hepatobiliary") ) %>% #download includes both separate and combined routes for other and uro so below can remove the separate new sites
group_by(factor, ref_route, date) %>%
summarise(
count = sum(count),
within_target = sum(within_target),
outside_target = round_half_up(sum(outside_target), digits = 0),
percent_within_target = within_target / count
)
CWT62ENG$ref_route <- make_clean_names(CWT62ENG$ref_route, allow_dupes = TRUE)
##Split into list with each element a df a site subset
CWT62ENGsplit <- CWT62ENG %>%
group_by(ref_route, factor) %>%
group_split(.keep = FALSE)
##Adding names to each df list element
names(CWT62ENGsplit) <- CWT62ENG %>%
group_by(ref_route, factor) %>%
group_keys() %>%
unite("name", ref_route:factor, sep = ":") %>%
pull(name)62 day modelling
## Modelling
predictions_ENG <- CWT62ENGsplit %>%
enframe(name = "list", value = "data") %>% # converts lists to single df
mutate(
# function to add 60 blank months to each df list element
data = map(data, \(df){
df <- df %>%
mutate(date = as.Date(date),
month_no = 1:nrow(df)) %>%
select(date, count, outside_target, month_no) %>%
arrange(date)
last_date <- max(df$date)
last_no <- max(df$month_no)
#future_dates <- seq.Date(from = last_date + months(1), by = "month", length.out = 60)
future_dates <- data.frame(date = seq.Date(from = last_date + months(1),
by = "month", length.out = 69),
month_no = seq(last_no + 1,
last_no + 69,
length.out=length((last_no+1):(last_no + 69))),
outside_target = NA,
count = NA)
bind_rows(df, future_dates)
}),
# Fits gaussian regression models to each list element using non NA i.e. historic data for COUNT in order to be used to offset the outside_target model
count_model_09 = map(data, \(df) glm(count ~ month_no, gaussian(),
data = df %>%
filter(!is.na(count)))),
count_model_19 = map(data, \(df) glm(count ~ month_no, gaussian(),
data = df %>%
filter(!is.na(count)
& date > "2019-01-01"))),
count_model_21 = map(data, \(df) glm(count ~ month_no, gaussian(),
data = df %>%
filter(!is.na(count)
& date > "2021-01-01"))),
count_model_precovid = map(data, \(df) glm(count ~ month_no, gaussian(),
data = df %>%
filter(!is.na(count)
& date < "2020-03-01"))),
# function to use the models to predict the NA rows for count
data = map2(count_model_09, data, \(mod, df) {
data <- df
data$predicted_count_09 <- predict(mod, newdata = data, type = "response")
data$predicted_count_09 <- data$predicted_count_09
# data$predicted_count_09[!is.na(data$count)] <- NA #Remove predictions for data we already have as redundant
data
}),
data = map2(count_model_19, data, \(mod, df) {
data <- df
data$predicted_count_19 <- predict(mod, newdata = data, type = "response")
data$predicted_count_19 <- data$predicted_count_19
# data$predicted_count_19[!is.na(data$count)] <- NA #Remove predictions for data we already have as redundant
data
}),
data = map2(count_model_21, data, \(mod, df) {
data <- df
data$predicted_count_21 <- predict(mod, newdata = data, type = "response")
data$predicted_count_21 <- data$predicted_count_21
#data$predicted_count_21[!is.na(data$count)] <- NA #Remove predictions for data we already have as redundant
data
}),
data = map2(count_model_precovid, data, \(mod, df) {
data <- df
data$predicted_count_precovid <- predict(mod, newdata = data,
type = "response")
data$predicted_count_precovid <- data$predicted_count_precovid
#data$predicted_count_precovid[!is.na(data$count)] <- NA #Remove predictions for data we already have as redundant
data
})
) %>%
unnest(data) %>%
select(-count_model_09, -count_model_19,
count_model_21, count_model_precovid) %>%
group_by(list) %>%
mutate(performance = 1-(outside_target/count),
average_6m_count = rollmean(count, 6, fill = NA,
align = "right"),
average_6m_outside = rollmean(outside_target, 6, fill = NA,
align = "right"),
average_6m_perf = 1-(average_6m_outside/average_6m_count),
multiplier = last(na.omit(average_6m_perf))) %>%
ungroup() %>%
separate(list, sep = ":", into = c("ref_route", "cancer_type")) %>%
mutate(predicted_outside = predicted_count_09 * (1-multiplier))
## TO see last X period averages:
# predictions_ENG %>% filter(date == "2024-06-01")
## To get new aggregate number
#Projected patients starting treatment over 5 years
England_count <- sum(filter(predictions_ENG, date >= "2024-07-01"
& date < "2029-07-01")$predicted_count_09,
na.rm = TRUE)
#Projected number of patients needed to meet target
England_outside <- predictions_ENG %>%
filter(date > "2024-06-01" & date < "2029-07-01") %>%
mutate(standard = 0.85,
additional_patients = (standard - multiplier) * predicted_count_09) %>% #Adds in how many patients needed to meet target
summarise(additional_patients = sum(additional_patients)) %>%
pull()
predictions_ENG_all <- predictions_ENG %>% mutate(country = "England") %>%
group_by(date, month_no, country) %>%
summarise(count = sum(count),
outside_target = sum(outside_target),
predicted_count_09 = sum(predicted_count_09),
predicted_count_19 = sum(predicted_count_19),
predicted_count_21 = sum(predicted_count_21),
predicted_count_precovid = sum(predicted_count_precovid)) %>%
ungroup() %>%
mutate(performance = 1-(outside_target/count),
average_6m_count = rollmean(count, 6, fill = NA,
align = "right"),
average_6m_outside = rollmean(outside_target, 6, fill = NA,
align = "right"),
average_6m_perf = 1-(average_6m_outside/average_6m_count),
multiplier = last(na.omit(average_6m_perf))) %>%
mutate(predicted_outside = predicted_count_09 * (1-multiplier),
ref_route = NA,
cancer_type = "All")
# Annual change figure for England ------
EnglandAnnualFigures <- predictions_ENG_all %>%
mutate(year = as.numeric(format(date,'%Y'))) %>%
filter(date >= "2023-07-01" & date < "2029-07-01") %>%
mutate(year = case_when(date > "2023-06-01"
& date < "2024-07-01" ~ "2023/24",
date > "2024-06-01"
& date < "2025-07-01" ~ "2024/25",
date > "2025-06-01"
& date < "2026-07-01" ~ "2025/26",
date > "2026-06-01"
& date < "2027-07-01" ~ "2026/27",
date > "2027-06-01"
& date < "2028-07-01" ~ "2027/28",
date > "2028-06-01"
& date < "2029-07-01" ~ "2028/29")) %>%
group_by(year) %>%
mutate(count = coalesce(count, predicted_count_09)) %>%
summarise(count = sum(count))
EnglandAnnualFigures <- EnglandAnnualFigures %>%
mutate(percent_inc = count / EnglandAnnualFigures[EnglandAnnualFigures$year == "2023/24",]$count * 100,
count_diff = count - lag(count, n = 1L),
perc_diff = percent_inc - lag(percent_inc, n = 1L),
yearno = row_number())
YearInc <- glm(log(count) ~ yearno, gaussian(), data = EnglandAnnualFigures)
#The annual %age increase
coef <- (exp(YearInc$coef[2])-1)*100
EnglandAnnualFigures <- EnglandAnnualFigures %>%
mutate(increase = (coef / 100) * lag(count, n = 1L),
new_total = count+ increase)62 day graph
predictions_ENG_summary <- predictions_ENG_all %>%
filter(date > "2024-06-01" & date < "2029-07-01") %>%
summarise(predicted_count_09 = sum(predicted_count_09),
predicted_count_19 = sum(predicted_count_19),
predicted_count_21 = sum(predicted_count_21))
plot_ly(predictions_ENG_all) %>%
add_trace(x = ~date,
y = ~predicted_count_09,
name = paste0("2009 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_09),
" people"),
line = list(color = "#ff0087", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(predictions_ENG_all, date >= "2019-01-01"),
x = ~date,
y = ~predicted_count_19,
name = paste0("2019 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_19),
" people"),
line = list(color = "#009cee", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(predictions_ENG_all, date >= "2021-01-01"),
x = ~date,
y = ~predicted_count_21,
name = paste0("2021 - ",
CRUKRoundingPerYear(predictions_ENG_summary$predicted_count_21),
" people"),
line = list(color = "#00007e", dash = 'dash'),
type = 'scatter',
mode = 'lines') %>%
add_trace(data = filter(predictions_ENG_all, date >= "2009-01-01"),
x = ~date,
y = ~count,
name = paste0("Historical data"),
line = list(color = "#000000"),
type = 'scatter',
mode = 'lines') %>%
#eng_graph <- ggplotly(eng_graph) %>%
config(displaylogo = FALSE,
toImageButtonOptions = list(
format = "png",
filename = paste0("England 62 day models"),
width = 1400,
height = 600)) %>%
layout(
showlegend = TRUE,
legend = list(
orientation = "h",
y = -0.1,
title = list(text = "Model starting year and predicted number of patients over next parliamentary term",
font = list(family = "Poppins", color = "#000000", size = 18),
side = "top"),
font = list(family = "Poppins", color= "#000000", size= 14)
),
yaxis = list(
title = "Number of people",
showgrid = T,
zeroline = T,
tickmode = 'auto',
tickformat = '~s',
nticks = 5,
range = c(0, 40000),
titlefont = list(family = "Poppins", color = "#000000", size = 18),
tickfont = list(family = "Poppins", color= "#000000", size= 14)),
xaxis = list(
title = "",
range = c("2010-01-01", "2030-01-01"),
tickmode = 'auto',
nticks = 12,
showgrid = F,
tickfont = list(family = "Poppins", color= "#000000", size= 14),
titlefont = list(family = "Poppins", color = "#000000", size = 18)),
title = list(text = "Number of people starting treatment on the 62-day pathway<br><sup>England</sup>",
font = list(family = "Progress Medium",
color = "#000000",size = 28),
xref = 'container',
xanchor = 'left',
x = 0.01),
margin = list(t = 75, b = 10) #Set margin sizes with t, b, r, l.
) 62 day target predictions
dataP <- predictions_ENG_all %>%
group_by(date, cancer_type) %>%
summarise(count = sum(count),
outside_target = sum(outside_target),
predicted_count = sum(predicted_count_09),
predicted_outside = sum(predicted_outside)) %>%
select(date, cancer_type, count, outside_target,
predicted_count, predicted_outside ) %>%
pivot_longer(
cols = c(count, outside_target, predicted_count, predicted_outside),
names_to = "type",
values_to = "number"
) %>%
mutate(
label = str_to_sentence(type) # Left this in case we want to adjust later
) %>%
filter(type == "count"
| type == "outside_target"
| (type == "predicted_count" & date > as.Date("2024-06-01"))
| (type == "predicted_outside" & date > as.Date("2024-06-01")))
colour_mapping <- c("Count" = "#000000",
"Predicted_count" = "#e60078", #bce4fa",
"Outside_target" = "#777777",
"Predicted_outside" = "#009cee" #"#f088b6"
)
dataP <- dataP %>%
mutate(line_type = case_when(type == "Count" ~ "solid",
type == "Predicted_count" ~ "dashed",
type == "Outside_target" ~ "solid",
type == "Predicted_outside" ~ "dashed"))
p <- ggplot(dataP) +
geom_line(aes(x = date, y = number, colour = label, linetype = "solid"), linewidth = 0.7) +
scale_colour_manual(values = colour_mapping) +
#scale_x_date(breaks = "12 months", date_labels = "%b %Y", expand = expansion(mult = c(0, 0.03))) +
scale_x_date(date_breaks = "2 years", date_labels = "%Y", limits = c(as.Date("2010-01-01"), as.Date("2029-06-30"))) +
scale_y_continuous(labels = scales::comma,
expand = expansion(mult = c(0, 0.03))) +
# labs(title = "Number of people starting treatment outside of the 62-day target window") +
theme_minimal() +
theme(axis.text.x = element_text(#angle = 45,
hjust = 1)) +
guides(linetype = "none")
ciTitleFont <- list(family = "Progress Medium", color = "#000000",size = 24)
ciAxisFont <- list(family = "Poppins", color = "#000000", size = 18)
ciTickFont <- list(family = "Poppins", color= "#000000", size = 14)
P <- ggplotly(p) %>%
layout(yaxis = list(title = "Number of patients",
titlefont = ciAxisFont,
tickfont = ciTickFont,
standoff = 20,
tickmode = 'auto',
tickformat = '~s',
nticks = 5,
range = c(0,30000),
showgrid = TRUE,
zeroline = T,
zerolinecolor = '#000000'
),
xaxis = list(title = "",
titlefont = ciAxisFont,
tickfont = ciTickFont,
showgrid = F
),
title = list(
text = "Number of people starting treatment outside of the 62-day target window<br><sup>England</sup>",
font = ciTitleFont,
xref = 'container',
xanchor = 'left',
x = 0.01),
legend = list(
orientation = "h",
y = -0.1,
title = list(text = ""),
font = list(family = "Poppins", color= "#000000", size= 14)
),
margin = list(t = 80, r = 50, b = 150)
) %>%
config(displaylogo = FALSE,
toImageButtonOptions = list(
format = "png",
filename = "62day_projections_ENG",
width = 1400,
height = 600))
P$x$data[[1]]$name <- "Number of people starting treatment"
P$x$data[[2]]$name <- "Number of people starting treatment\noutside of the 62 day window"
P$x$data[[3]]$name <- "Projected number of people starting\ntreatment"
P$x$data[[4]]$name <- "Projected number of people starting\ntreatment outside of the 62 daywindow"
P$x$layout$legend$title <- ""
P[["x"]][["data"]][[3]][["line"]][["dash"]] <- 'dot'
P[["x"]][["data"]][[4]][["line"]][["dash"]] <- 'dot'
PIf the government does not ensure the NHS is equipped to deal with this, more than 300,000 cancer patients who are urgently referred could wait longer than they should for cancer treatment between now and 2029.
Key learnings
Modelling doesn’t need to be complex, it needs to make sense.
Initially used a poisson model approach with some adjustments, but the results didn’t look reasonable. With time pressure of an election cycle, stripping it back to something simpler meant we could still deliver results and have an influence.
What would I do differently?
If we had more time, I would explore other modelling approaches - binomial or GAM
Had done similar projections for each devolved nation, and local areas. Would have been great to explore these more - use shiny so people could search for their area and see how it compares to the national picture.